3. Working with a World Bank Model under modelflow#

The basic method for working with any model is the same. Indeed the initial steps followed here are the same as were followed during the simple model discussion.

Process:

  1. Prepare the workspace

  2. Load the model Modelflow

  3. Design some scenarios

  4. Simulate the model

  5. Visualize the results

3.1. Prepare the work space#

Anytime that we want to use modelflow we must first import pandas and the modelclass into the python workspace.

# Prepare the notebook for use of modelflow 

# Jupyter magic command to improve the display of charts in the Notebook
%matplotlib inline

# Import pandas 
import pandas as pd

# Import the model class from the modelclass module 
from modelclass import model 

# functions that improve rendering of modelflow outputs
model.widescreen()
model.scroll_off();

3.2. Load the model: Load a pre-existing model, data and descriptions#

To load a model use the model.modelload() method of modelflow. In the example below, the model has been saved to the models folder located one level above the directory from which the Jupyter Notebook has been executed.

3.2.1. The .modelload() method#

The command below

mpak,bline = model.modelload('..\models\pak.pcim', alfa=0.7,run=1,keep= 'Baseline')

instantiates (creates an instance of) a modelflow model object and assigns it to the variable name mpak.

The run=1 option executes the model and assigns the result of the model execution to the dataframe bline.

The model is solved with the parameter alfa set to 0.7. The \(alfa \in (0,1)\) parameter determines the step size of the solution engine. The larger alfa the larger the step size. Larger step sizes may solve faster, but may have trouble finding a unique solution. Smaller step sizes take longer to solve but are more likely to find a unique solution. Values of alfa=.7 work well for World Bank models.

The keep option instructs modelflow to maintain in the model object (mpak) the results of the initial scenario, assigning it the text name Baseline. As written, modelload returns both the model object mpak, but also a dataframe bline that is assigned the the results of the simulation. This dataframe is distinct from the one that is stored inside the mpak model object by the keep= command, although the data inside each of these dataframes will have the same numerical values. The keep option is described in more detail in the following chapter on scenarios.

Warning

If modelflow cannot find the file at the position indicated it will look for it in the global Model repository on line.

The modelload command does indicate from where the model that was loaded was retrieved. In this case from the requested local file store.

#Replace the path below with the location of the pak.pcim file (or some other world bank model file) on your computer
mpak,bline = model.modelload('..\models\pak.pcim', \
                                alfa=0.7,run=1,keep= 'Baseline')
file read:  C:\modelflow manual\papers\mfbook\content\models\pak.pcim

3.2.2. Extracting information about the model#

The newly loaded python object mpak is an instance of the model class and as such inherits the methods (functions) and properties (data) of that class. To learn about the model there are a variety of methods that can be used to extract information about the model and its data.

A World Bank model in modelflow will contain a wide range of objects.

  • variables – time series variables comprised of mnemonics and data

  • dataframes – data for each variable generated in different simulations

  • groups – lists of variables

  • equations – identities and behaviourals

  • model – the model object itself

Extracting information about each of these objects is central to working with WBG models in modelflow.

The model object contains information about the model itself, its name, its structure (does it contain simultaneous equations or is it recursive), the number of variables it contains and the number that are exogenous and endogenous (have associated equations). Executing the unadorned name of a model object, i.e. mpak displays summary information about the model object.

mpak
<
Model name                              :                  PAK 
Model structure                         :         Simultaneous 
Number of variables                     :                  839 
Number of exogeneous  variables         :                  461 
Number of endogeneous variables         :                  378 
>

The model work space also has a time dimension, its sample period. This can be retrieved and changed.

mpak.current_per
Int64Index([2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026,
            2027, 2028, 2029, 2030],
           dtype='int64')

Here the model is currently set up to solve over the period 2016 through 2030. That period can be changed assuming as is the case with the Pakistan model that additional data are available.

3.2.3. Information about variables#

The model object mpak contains lists of all the variables that form part of the model, and these lists can be interrogated to garner information about the model. The Table below indicates some of the most important of these queries. The variables for which information is sought can be specified directly or through a wildcard specification (see note).

Method

Example

Information returned

.names

modelname['PAKNECON*XN].name

returns a python list of the mnemnics of all the variables defined and contained in the model object that match the search paremers in the []

.des

modelname['PAKNECONPRVT?N'].des

Dictionary of mnemonics and their variable descriptions

.desc

`modelname[‘PAKNECONPRVTXN’]

List of variable description alone

.<var name>.show

modelname.PAKNECONPRVTXN.show

Lists the equation (formula), variable descriptions and variable values

Note

Wildcards

Most of the information commands accept wildcard specifications in the search parameter.

The * character in the command mpak['PAKNECON*XN'].names example is a wildcard character and the expression will return all variables that begin PAKNECON and end XN.

The ? in the .des example is another wildcard expression. It will match only single characters. Thus mpak['PAKNECONPRVT?N'].names would return three variables: PAKNECONPRVTKN, PAKNECONPRVTXN, and PAKNECONPRVTXN. The real, current value, and deflators for household consumption expenditure.

Note the final show example uses a slightly different syntax where the variable to be operated upon is specified directly: modelname.PAKNECONPRVTXN.show.

The example below returns the mnemonics and descriptions of all variables matching the pattern PAKNYGDP*KN, i.e. Pakistani variables from the National Income Accounts from the main sub-category GDP that are also real variables.

mpak['PAKNYGDP*KN'].des
PAKNYGDPDISCKN : GDP Disc., 2000 LCU mn
PAKNYGDPFCSTKN : GDP Factor Cost Local Currency units Volumes National base year
PAKNYGDPMKTPKN : Real GDP
PAKNYGDPPOTLKN : Potential Output, constant LCU

Box 4. World Bank Mnemonics

A typical World Bank model will have in excess of 300 variables. Each has a mnemonic typically comprised of 14 characters that is structured in a specific way, The root for almost all are the three letters of the ISO code for the country to which the variable pertains. (see discussion in section).

\[\texttt{12345678901234}\]
\[\color{green}{\texttt{CCC}}\color{red}{\texttt{AA}}\color{lime}{\texttt{MMM}}\color{blue}{\texttt{NNNN}}\color{magenta}{\texttt{U}}\color{black}{\texttt{C}}\]

where:

Letters

Meaning

\(\color{green}{\texttt{CCC}}\)

The three-leter ISO code for a country – i.e. IDN for Indonesia, RUS for Russia

\(\color{red}{\texttt{AA}}\)

The two-letter major accounting system to which the variable attaches,

\(\color{lime}{\texttt{MMM}}\)

The three-letter major sub-category of the data - i.e. GDP, EXP - expenditure

\(\color{blue}{\texttt{NNNN}}\)

The four-letter minor sub-category MKTP for market prices

\(\color{magenta}{\texttt{U}}\)

The measure (K: real variable;C: Current Values; X: Prices)

\(\color{black}{\texttt{C}}\)

denotes the Currency (N: National currency; D: USD; P: PPP)

Occasionally you will see variables with and ‘_’ appended to the name. This indicates that the variable is being expressed as a percent of something (usually GDP). Thus PAKBNCABFUNDCD_ means Pakistany Balance (BN) of the Current account (CAB) Imf definition (FUND) in Current (C) Dollars (D) expressed as a percent of GDP.

Others include ER (Effective rate) and (SR) Statutory rate

Common major accounting systems mnemonics: the, \(\color{red}{\texttt{AA}}\)s from above include:

Code

Meaning

NY

National income

NE

National expenditure Accounts

NV

Value added accounts

GG

General Government Accounts

BX

Balance of Payments: Exports

BM

Balance of Payments: Imports

BN

Balance of Payments: Net

BF

Balance of Payments: Financial Account

Thus

Mnemonic

Meaning

IDNNYGDPMKTPKN

Indonesia GDP at market prices, real in Indonesian Rupiah

KENNECPNPRVTXN

Kenya Private (household) consumption expenditure schillings deflator

BOLGGEXPGNFSCN

Bolivia Government Expenditure on Goods and services (GNFS) in current Bolivars

HRVGGREVDCITCN

Croatia Government Revenues Direct Corporate Income Taxes in current Euros

NPLBXGSRNFSVCD

Nepal BOP Exports of non-factor services (goods and services) in current USD

The command mpak['*'].des will return a dictionary of all the mnemonics and descriptions of all the variables in the mpak model object – a list that would run to 100s of variables.

3.2.3.1. The ! operator – searching on the variable description#

The ! operator allows the same methods to be used to retrieve information about variables, based on their descriptions. Pre-pending the search string with the ! operator, tells it to try and match (and display) information about variables based on their descriptions not their mnemonics.

Note

The ! operator If a wildcard is preceded by an exclamation mark ! the search will be done over the description of variables instead of the mnemonic

The below expression returns all variables whose description includes the word Carbon.

mpak['!*Carbon*'].names
['PAKGGREVCO2CER', 'PAKGGREVCO2GER', 'PAKGGREVCO2OER']
mpak['!*Carbon*'].des
PAKGGREVCO2CER : Carbon tax on coal (USD/t)
PAKGGREVCO2GER : Carbon tax on gas (USD/t)
PAKGGREVCO2OER : Carbon tax on oil (USD/t)

3.3. Groups#

Modelflow incorporates a variant of the idea of groups from EViews. In modelflow the groups defined in an imported EViews workfile are converted into entries in a dictionary called var_groups which can be interrogated, added to and amended like any dictionary in python.

The command mpak.var_groups will return all of the groups already defined in mpak.

mpak.var_groups
{'Headline': '???GDPpckn ???NRTOTLCN ???LMEMPTOTL ???BFFINCABDCD  ???BFBOPTOTLCD ???GGBALEXGRCN ???BNCABLOCLCD_ ???FPCPITOTLXN',
 'National income accounts': '???NY*',
 'National expenditure accounts': '???NE*',
 'Value added accounts': '???NV*',
 'Balance of payments exports': '???BX*',
 'Balance of payments exports and value added ': '???BX* ???NV*',
 'Balance of Payments Financial Account': '???BF*',
 'General government fiscal accounts': '???GG*',
 'World all': 'WLD*',
 'PAK all': 'PAK*'}

A group can be added to the dictionary by giving it a unique identifier (key) and associating with it a string defining the group, using a wildcard specification or just a space de-limited list of mnemonics.

Thus the command

mpak.var_groups['Mygroup']='PAKGGREV*CN PAKGGBALOVRLCN'
                

will generate a new group called ‘My_Group’ that contains all variables beginning PAKGGREV and ending CN, plus the variable PAKGGBALOVRLCN to the dictionary var_groups that is part of the model object mpak.

mpak['#Mygroup'].names
['PAKGGREVDRCTCN',
 'PAKGGREVEMISCN',
 'PAKGGREVGNFSCN',
 'PAKGGREVGRNTCN',
 'PAKGGREVOTHRCN',
 'PAKGGREVTOTLCN',
 'PAKGGREVTRDECN',
 'PAKGGBALOVRLCN']

3.4. Information about data#

The unadorned command mpak[#MyGroups] invokes a widget that shows all of the data in the group MyGroup and various representations (level and growth rates) both as tables and charts.

mpak['#Mygroup']
group output widget

Alternatively just the graphs and or tables can be returned, by appending the .df method (tables) or .plot() methods (charts). Modifying the command further by including the .pct command (to show growth rates) and mul100 to multiply them by 100 would display the data as growth rates.

mpak['#Mygroup'].df
PAKGGREVDRCTCN PAKGGREVEMISCN PAKGGREVGNFSCN PAKGGREVGRNTCN PAKGGREVOTHRCN PAKGGREVTOTLCN PAKGGREVTRDECN PAKGGBALOVRLCN
2016 1.192249e+06 -187487.145981 1.319524e+06 28696.665845 1.704982e+06 4.463113e+06 4.051485e+05 -1.322586e+06
2017 1.354036e+06 -368998.340222 1.327860e+06 25349.188640 2.141104e+06 4.976845e+06 4.974946e+05 -1.833428e+06
2018 1.492389e+06 -373989.822454 1.516485e+06 49838.249034 2.505467e+06 5.802066e+06 6.118760e+05 -1.814775e+06
2019 1.721883e+06 -387328.505789 1.764865e+06 78978.628057 3.033525e+06 6.967846e+06 7.559233e+05 -1.764188e+06
2020 1.950849e+06 -391591.257701 1.998747e+06 110163.222303 3.574406e+06 8.136424e+06 8.938484e+05 -1.798450e+06
2021 2.178938e+06 -392424.135300 2.225111e+06 142678.758083 4.122857e+06 9.307621e+06 1.030460e+06 -1.857821e+06
2022 2.407303e+06 -393527.268226 2.450129e+06 176071.657561 4.677542e+06 1.048922e+07 1.171700e+06 -1.939507e+06
2023 2.644233e+06 -396768.488814 2.685256e+06 210616.944739 5.252367e+06 1.171840e+07 1.322695e+06 -2.033015e+06
2024 2.894933e+06 -402416.005463 2.936720e+06 246606.675733 5.856855e+06 1.301870e+07 1.486000e+06 -2.146245e+06
2025 3.161598e+06 -409971.055842 3.206328e+06 284195.080753 6.495230e+06 1.439974e+07 1.662364e+06 -2.279101e+06
2026 3.444363e+06 -418754.038483 3.493313e+06 323384.803563 7.167704e+06 1.586160e+07 1.851585e+06 -2.433659e+06
2027 3.743124e+06 -428241.870277 3.796738e+06 364156.625952 7.874000e+06 1.740321e+07 2.053431e+06 -2.610006e+06
2028 4.058704e+06 -438151.421766 4.116861e+06 406583.534824 8.615802e+06 1.902797e+07 2.268174e+06 -2.806904e+06
2029 4.393195e+06 -448385.653605 4.455458e+06 450879.135036 9.397577e+06 2.074544e+07 2.496720e+06 -3.022765e+06
2030 4.749689e+06 -458938.008033 4.815425e+06 497380.143807 1.022607e+07 2.257011e+07 2.740488e+06 -3.256688e+06
round(mpak['#Mygroup'].pct.mul100.df,2) # round restricts the display to 2 decimal points
PAKGGREVDRCTCN PAKGGREVEMISCN PAKGGREVGNFSCN PAKGGREVGRNTCN PAKGGREVOTHRCN PAKGGREVTOTLCN PAKGGREVTRDECN PAKGGBALOVRLCN
2016 14.64 -49.58 21.19 28.28 -8.72 12.89 32.34 -7.79
2017 13.57 96.81 0.63 -11.67 25.58 11.51 22.79 38.62
2018 10.22 1.35 14.21 96.61 17.02 16.58 22.99 -1.02
2019 15.38 3.57 16.38 58.47 21.08 20.09 23.54 -2.79
2020 13.30 1.10 13.25 39.48 17.83 16.77 18.25 1.94
2021 11.69 0.21 11.33 29.52 15.34 14.39 15.28 3.30
2022 10.48 0.28 10.11 23.40 13.45 12.69 13.71 4.40
2023 9.84 0.82 9.60 19.62 12.29 11.72 12.89 4.82
2024 9.48 1.42 9.36 17.09 11.51 11.10 12.35 5.57
2025 9.21 1.88 9.18 15.24 10.90 10.61 11.87 6.19
2026 8.94 2.14 8.95 13.79 10.35 10.15 11.38 6.78
2027 8.67 2.27 8.69 12.61 9.85 9.72 10.90 7.25
2028 8.43 2.31 8.43 11.65 9.42 9.34 10.46 7.54
2029 8.24 2.34 8.22 10.89 9.07 9.03 10.08 7.69
2030 8.11 2.35 8.08 10.31 8.82 8.80 9.76 7.74

Below the command has been placed inside a with mpak.set_smpl() clause to restrict the output to a shorter period. If it was not used the output would cover the whole time period of the .lastdf DataFrame from which all of this data is drawn.

Note

When using a with clause, an explicit print statement is required.

with mpak.set_smpl(2020,2030):
    print(round(mpak['#Mygroup'].pct.mul100.df,2))
      PAKGGREVDRCTCN  PAKGGREVEMISCN  PAKGGREVGNFSCN  PAKGGREVGRNTCN  \
2020           13.30            1.10           13.25           39.48   
2021           11.69            0.21           11.33           29.52   
2022           10.48            0.28           10.11           23.40   
2023            9.84            0.82            9.60           19.62   
2024            9.48            1.42            9.36           17.09   
2025            9.21            1.88            9.18           15.24   
2026            8.94            2.14            8.95           13.79   
2027            8.67            2.27            8.69           12.61   
2028            8.43            2.31            8.43           11.65   
2029            8.24            2.34            8.22           10.89   
2030            8.11            2.35            8.08           10.31   

      PAKGGREVOTHRCN  PAKGGREVTOTLCN  PAKGGREVTRDECN  PAKGGBALOVRLCN  
2020           17.83           16.77           18.25            1.94  
2021           15.34           14.39           15.28            3.30  
2022           13.45           12.69           13.71            4.40  
2023           12.29           11.72           12.89            4.82  
2024           11.51           11.10           12.35            5.57  
2025           10.90           10.61           11.87            6.19  
2026           10.35           10.15           11.38            6.78  
2027            9.85            9.72           10.90            7.25  
2028            9.42            9.34           10.46            7.54  
2029            9.07            9.03           10.08            7.69  
2030            8.82            8.80            9.76            7.74  

When displaying a dataframe or a manipulation of a dataframe in cases where the output might include very many lines of output, Jupyter will, by default, truncate the output by showing the first and last five observations of the active sample period when the same call is made without the with clause.

mpak.smpl(2000,2100)  # change the default bview to cover 100 observations
mpak['#Mygroup'].pct.mul100.df  #Jupyter will truncate the output
PAKGGREVDRCTCN PAKGGREVEMISCN PAKGGREVGNFSCN PAKGGREVGRNTCN PAKGGREVOTHRCN PAKGGREVTOTLCN PAKGGREVTRDECN PAKGGBALOVRLCN
2000 9.550328 101.829915 70.016016 NaN NaN 7.298335 -21.682305 15.136903
2001 11.138391 15.374786 31.458374 inf inf 16.344272 5.519481 -35.569074
2002 14.660537 -13.232471 8.545928 94.822463 17.589007 22.839281 -26.435385 -19.000008
2003 7.111796 35.469443 17.116998 -36.962342 15.198571 6.038960 43.955079 23.549681
2004 8.425066 21.635646 13.051789 -39.977372 26.297036 15.683372 32.113024 -19.924674
... ... ... ... ... ... ... ... ...
2096 9.025284 2.844796 9.066803 9.025327 9.025299 9.027988 8.957897 9.027308
2097 9.021221 2.842709 9.063564 9.021258 9.021234 9.024111 8.955230 9.023692
2098 9.017159 2.840601 9.060286 9.017190 9.017170 9.020235 8.952564 9.020031
2099 9.013108 2.838480 9.056979 9.013134 9.013117 9.016371 8.949905 9.016341
2100 9.009075 2.836350 9.053653 9.009098 9.009083 9.012525 8.947260 9.012634

101 rows × 8 columns

mpak['#Mygroup'].pct.mul100.plot(title="Plot of Mygroup\ngrowth rates");
../../_images/15ad5ff9696c0f17ff8dabab765ca7c771f416fc6ca3ec67b6274e02a060b6c3.png

3.4.1. Some examples#

Results for more than one variable can also be displayed using the wildcard search methods described earlier.

3.4.1.1. .names property#

mpak['PAKNECON*XN'].names

Return the names (mnemonmics) of all variables that begin PAKNECON and end XN – i.e. Price deflators for various types of consumption demand.

mpak['PAKNECON*XN'].names
['PAKNECONENGYXN', 'PAKNECONGOVTXN', 'PAKNECONOTHRXN', 'PAKNECONPRVTXN']

3.4.1.2. The .des property#

mpak['PAKNECONPRVT?N'].des

Returns a dictionary comprised of the mnemonics and the descriptions of all the variables that begin PAKNECONPRVT and end N, but have only one character between the T and the N.

mpak['PAKNECONPRVT?N'].des
PAKNECONPRVTCN : Pvt. Cons., LCU mn
PAKNECONPRVTKN : HH. Cons Real
PAKNECONPRVTXN : Implicit LCU defl., Pvt. Cons., 2000 = 1

3.4.1.3. .var_description method#

The property .var_descriptionreturns the descriptor of all variables. Modified to a psecifc variable it returns the description of that one variable. This method does not accept wildcards.

#mpak.var_description # returns the descirptions for all variables
mpak.var_description['PAKNYGDPMKTPCN'] # returns the description of a specific variable
'GDP, Market Prices, LCU mn'

3.5. Information about equations#

Information about specific equations can also be extracted and displayed.

3.5.1. The endogene property#

The endogene property returns a list of all variables in the model that are endogenous (have an equation). It can also be used to test whether a specific mnemonic has an equation associated with it.

The endogene property returns a set. For brevity only the first 5 elements are show below.

sorted(mpak.endogene)[:5]
['CHNEXR05', 'CHNPCEXN05', 'DEUEXR05', 'DEUPCEXN05', 'FRAEXR05']

The expression 'PAKNECONPRVTKN' in mpak.endogene returns True if the passed mnemonic is in the list returned by mpak.endogene.

'PAKNECONPRVTKN' in mpak.endogene
True

3.5.2. Retrieving info on equations#

There are three functions to extract the equations from a model.

Command

Effect

mpak['PAKNECONPRVTKN'].frml

Returns a normalized version of the equation (the one actually used in modelflow)

mpak['PAKNECONPRVTKN'].eviews

In models imported from Eviews, reports the original eviews specification

mpak.PAKNECONPRVTXN.show

Displays the equation (formula); variable descriptions; and variable values.

3.5.3. The .eviews method#

The mpak['PAKNECONPRVTKN'].eviews command returns the equations before they were normalized. In most cases this is a slightly more legible form. Here following the EViews syntax, \(\Delta ln()\) is written as dlog().

mpak['PAKNECONPRVTKN'].eviews
PAKNECONPRVTKN : DLOG(PAKNECONPRVTKN) =- 0.2*(LOG(PAKNECONPRVTKN( - 1)) - LOG(1.21203101101442) - LOG((((PAKBXFSTREMTCD( - 1) - PAKBMFSTREMTCD( - 1))*PAKPANUSATLS( - 1)) + PAKGGEXPTRNSCN( - 1) + PAKNYYWBTOTLCN( - 1)*(1 - PAKGGREVDRCTXN( - 1)/100))/PAKNECONPRVTXN( - 1))) + 0.763938860758873*DLOG((((PAKBXFSTREMTCD - PAKBMFSTREMTCD)*PAKPANUSATLS) + PAKGGEXPTRNSCN + PAKNYYWBTOTLCN*(1 - PAKGGREVDRCTXN/100))/PAKNECONPRVTXN) - 0.0634474791568939*@DURING("2009") - 0.3*(PAKFMLBLPOLYXN/100 - DLOG(PAKNECONPRVTXN))

3.5.4. The .frml method#

The .frml method returns the normalized equation that is actually used in modelflow.

In this instance the variable to be displayed is referenced directly (not as teh result of a search operaiot ['partial*variablname'] syntax.

Following the normalized equation is a listing of all the dependent variables of the equation and their descriptions.
mpak.PAKNECONPRVTKN.frml
Endogeneous: PAKNECONPRVTKN: HH. Cons Real
Formular: FRML <DAMP,STOC> PAKNECONPRVTKN = (PAKNECONPRVTKN(-1)*EXP(PAKNECONPRVTKN_A+ (-0.2*(LOG(PAKNECONPRVTKN(-1))-LOG(1.21203101101442)-LOG((((PAKBXFSTREMTCD(-1)-PAKBMFSTREMTCD(-1))*PAKPANUSATLS(-1))+PAKGGEXPTRNSCN(-1)+PAKNYYWBTOTLCN(-1)*(1-PAKGGREVDRCTXN(-1)/100))/PAKNECONPRVTXN(-1)))+0.763938860758873*((LOG((((PAKBXFSTREMTCD-PAKBMFSTREMTCD)*PAKPANUSATLS)+PAKGGEXPTRNSCN+PAKNYYWBTOTLCN*(1-PAKGGREVDRCTXN/100))/PAKNECONPRVTXN))-(LOG((((PAKBXFSTREMTCD(-1)-PAKBMFSTREMTCD(-1))*PAKPANUSATLS(-1))+PAKGGEXPTRNSCN(-1)+PAKNYYWBTOTLCN(-1)*(1-PAKGGREVDRCTXN(-1)/100))/PAKNECONPRVTXN(-1))))-0.0634474791568939*DURING_2009-0.3*(PAKFMLBLPOLYXN/100-((LOG(PAKNECONPRVTXN))-(LOG(PAKNECONPRVTXN(-1)))))) )) * (1-PAKNECONPRVTKN_D)+ PAKNECONPRVTKN_X*PAKNECONPRVTKN_D  $

PAKNECONPRVTKN  : HH. Cons Real
DURING_2009     : 
PAKBMFSTREMTCD  : Imp., Remittances (BOP), US$ mn
PAKBXFSTREMTCD  : Exp., Remittances (BOP), US$ mn
PAKFMLBLPOLYXN  : Key Policy Interest Rate
PAKGGEXPTRNSCN  : Current Transfers
PAKGGREVDRCTXN  : Direct Revenue Tax Rate
PAKNECONPRVTKN_A: Add factor:HH. Cons Real
PAKNECONPRVTKN_D: Fix dummy:HH. Cons Real
PAKNECONPRVTKN_X: Fix value:HH. Cons Real
PAKNECONPRVTXN  : Implicit LCU defl., Pvt. Cons., 2000 = 1
PAKNYYWBTOTLCN  : Total Wage Bill
PAKPANUSATLS    : Exchange rate LCU / US$ - Pakistan

3.6. Special features of Behavioural equations in MFMod#

If you look carefully at the output from the .frml() method above, you will note that there are three special variables included in the model that are not part of the eviews output. These variables each have the same root mnemonic as the dependent variable PAKNECONRPVTKN but have special terminators appended to them:

Terminator

Meaning /role

_A

Add factor: special variable to allow judgmenet to be added to an equation

_X

Exogenized value: Special variable that stores the value that the equation should return if exogenized

_D

Exogenous dummy: Dummy variable. When set to one, the equation will return the value of the $$_X** variable, if zero, it returns the fitted value of the equation plus the Add factor.

3.6.1. The _D and _X terminators in behavioural equations#

Recall a behavioural equation determines the value of an endogenous variable, based on an econometric relationship rather than an accounting identity. They are comprised of right-hand side variables (the regressors in the econometric relationship or the dependent variable), left hand side variables (the regressands or explanatory variables), estimated parameters, perhaps some imposed parameters, and the error term. Assume \(y_t\) is the dependent variable, \(x_t\) a vector of explanatory variables and \(\eta_t\) the error term, then a simple regression can be written as:

\[y_t = \alpha + \beta X_t + \eta_t\]

where \alpha and \beta are parameters to be estimated or in the case of \beta a vector of estimated parameters. Written following estimation we have

\[y_t = \hat{\alpha} + \hat{\beta} X_t + \hat{\eta_t}\]

where the hats “^” the specific value for the parameter that emerged from the estimation process. We can also write an expresion for \(\hat{y}_t\) the fitted value from the regression as:

\[\hat{y}_t = \hat{\alpha} + \hat{\beta} X_t \]

Substituting this expression into the previous expression and re-arranging gives us

\[y_t-\hat{y}_t= \hat{\eta_t}\]

All of which is fairly elementary results from econometrics.

3.6.2. The add factor in behavioral equations#

In the forecast period, econometrically, the expected value of \(E{\eta}_t\) is zero. So the econometric representation of the above Equation during the forecast period is

\[\begin{align*} y_t-\hat{y}_t &= E (\hat{\eta_t}) \\ y_t-\hat{y}_t &= 0 \\ \end{align*}\]

In Macrostructural models the first of these equations is rewritten by substituting \(AF_t\) for \(\hat{\eta}_t\).

\[y_t= \hat{y}_t + AF_t\]

By imposing a non-zero value on \(AF_t\), the modeller can add her judgment to the model’s fitted value, either to reflect a view that the forecast value of y will deviate from the fitted value, or because some change in circumstances will cause the underlying equation to be different in the future.

In World Bank model’s using modelflow the addfactor of an equation is given the same mnemonic as the dependent variable with an _A appended to it. Thus, in the above simplified version, the equation would be written as

\[y_t = \hat{\alpha} + \hat{\beta} X_t + y\_A_t\]

3.6.3. Excluding behavioural equations#

In modelflow behavioural equations can be excluded “de-activated” (see following discussion). This is achieved by adding two additional variables to each equation. The first is given the name of the dependent variable with _D appended. The second is given the name of the dependent variable with _x appended.

The preceding equation is then re-written as below

\[\begin{equation*} y_t = (1-y\_D_t)\cdot\underbrace{\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]}_{\text{Econometric equation}} + y\_D_t\cdot \underbrace{y\_X_t}_{\stackrel{\text{Exogenized}}{\text{value}}} \end{equation*}\]

Alternative

\[\begin{equation*} y_t = (1-y\_D_t)\cdot\underbrace{\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]}_{\begin{array}{c} \text{Econometric equation}\end{array}} + y\_D_t\cdot \underbrace{y\_X_t}_{\begin{array}{c} \text{Exogenized} \\ \text{value} \end{array}} \end{equation*}\]

When \(y\_D_t\) = 0, the second part of the equation \(y\_D_t*y\_X_t\) evaluates to zero and drops out, while the expression \((1-y\_D_t)\) evaluates to one. Thus the whole equation simplifies to the standard behavioural equation.

\[\begin{align*} y_t &= 1\cdot\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]+ 0\\ y_t &= \hat{\alpha} + \hat{\beta} X_t + y\_A_t \end{align*}\]

When \(y\_D_t\) = 1, the \((1-y\_D_t)\) evaluates to zero so the first part of the equation drops out, so the equation simplifies to:

\[\begin{align*} y_t &= 0\cdot\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]+ 1\cdot y\_X_t\\ y_t &= y\_X_t\\ \end{align*}\]

Thus the whole equation simply sets the endogenous variable \(y_t\) equal to the exogenous variable \(y\_X_t\).

3.6.4. The .show method#

The .show method returns:

  1. The description of the variable

  2. The normalized equation that is actually used in modelflow.

  3. A listing of the mnemonics and descriptions of the RHS variables

  4. The data of that variable (drawn from the basedf and .lastdf DataFrames in the model object as well as the data of the RHS variables of the equation from both the basedf and .lastdf DataFrames.

mpak.smpl(2020,2025) #change the actual sample range to limit the number of columns displayed
mpak.PAKNECONPRVTKN.show
show output show output show output show output

3.7. The ECM specification#

Many of the behaviouiral equations in Wor4ld Bank models are written as Error Correction Models (ECMs).

The Error correction specification was developed to deal with two important problems in econometric equations.

  1. Many time series econometric data tend to increase over time. As a result, a regression of one series on another series tends to have good fit even if the two variables are not really connected economically. For example, the price of cookies tends to rise over time because of inflation. Similarily, the quantity of screws produced in the manufacturing sector tends to rise over time because of increased demand for manufactured goods. Regressing screw production on cookie prices will show a strong but spurious correlation.

  2. Purely short run models focused on growth or differences and got around the problem of the spurious correlation arising from regressing two unrelated series that each had a trend. While, these explained the short run deviations, but if you strung the estimated growth rates together they could result in implicit levels that become unstable in the forecast period and were not anchored to the long-run relationship between variables dictated by underlying economic theory.

The solution to the above problems was resulted in the co-integration approach to econometrics, and the ECM approach which seeks to model both the long run relationship and short-run relationships between variables.

The ECM specification used in World Bank models is a single equation approach that follows (Wickens and Breusch [1988]) and is comprised of two parts (the long run relationship, and the short-run relationship), which are estimated simultaneously.

Consider as an example two variables say consumption and disposable income. Both have an underlying trend or in the parlance are co-integrated to degree 1. For simplicity we call them y and x.

For many of the variables in Wold Bank models, behavioural functions are estimated using an Error Correction Framework that splits the equation into a theoretically determined long run component and a more idiosyncratic short-run component.

Sometimes these equations (notably the mnemonics) can be difficult to read. However, with a bit of experience they become easier to read

3.7.1. The short run relationship#

In its simplest form we might have a short run relationship between the growth rates of our two variables such that:

\[\Delta ln(Y_t) = \alpha + \beta \Delta ln(X_t) +\epsilon_t\]

or substituting lower case letters for the logged values.

\[\Delta y_t = \alpha + \beta \Delta x_t +\epsilon_t\]

3.7.2. The long run equation#

The long run relates the level of the two (or more) variables. A simplified version of that equation can be written as:

\[Y_t=αX_t^β+ \eta_t\]

Rewriting this (in logarithms) it can be expressed as:

\[y_t = ln⁡(α) + βx_t + \eta_t\]

3.7.3. The long run equation in the steady state#

Note that in the steady state the expected value of the error term in the long run equation is zero (\(\eta_t=0 \)) so in those conditions the long run relationship can be simplified to:

\[y_t=ln⁡(α)+\beta x_t + 0\]

or equivalently (substituting A for the log of \(\alpha\)).

\[y_t-A-βx_t=0\]

Moreover if this expression is multiplied by some arbitrary constant, say \(-\lambda\), it would still equal zero.

\[-\lambda(y_t -A-βx_t)\]

and in the steady state this will also be true for the lagged variables

\[-\lambda(y_{t-1}- A - βx_{t-1})\]

The part of the equation between the parenthesis () is the re-normalized long-run equation. In the Long Run its expected value is zero, but at any give instant it could be different from zero. The distance it is from zero at any point in time, reflects the distance that the model is from equilibrium at that moment.

3.7.4. Putting it together#

From before we have the short run equation:

\[\Delta y_t = \alpha + \beta \Delta x_t +\epsilon_t\]

Inserting the steady state expression for the long-run into the short run equation makes no difference (in the long run) because in the long run it is equal to zero.

\[\Delta y_t = -\lambda(y_{t-1}-A-\beta x_{t-1}) + \alpha + \beta \Delta x_t +\epsilon_t\]

When the model is not in the steady state, the expression \(y_{t-1}-A-βx_{t-1}\) is of course the error term from the long run equation from the previous period (a measure of how far the dependent variable was from equilibrium).

3.7.5. Lambda, the speed of adjustment#

The parameter \(\lambda\) can then be interpreted as the speed of adjustment. It determines what share of the previous period error (distance from equilibrium) is absorbed in the following period. As long as \(\lambda\) is greater than zero and less or equal to one if there are no further disturbances ( \(\epsilon_t=0\)) the expression multiplied by lambda will slowly decline toward zero. How fast depends on how large or small is \(\lambda\).

To be convergent \(\lambda\) must be between 0 and 2, if its is negative or greater than one, then the long run portion of the equation will cause the disequilibrium to grow each period (\(\lambda\) >1) not diminish. If Lambda is greater than 1 but less than 2 (1<\(\lambda\)<2) output will oscillate from positive to negative (\(\lambda <0\)) but will slowly converge.

Intuitively, the lagged long-run error-term measures how far the model was from equilibrium one period earlier (at t-1). The ECM term (multiplied by \(\lambda\) ensures the model will slowly converge to equilibrium – the point at which the long run equation holds exactly – if \(\lambda\) is greater than zero but less than or equal to one. In these conditions during each each time period some portion \(\lambda\) of the previous period year’s disequilibrium will be absorbed each year. How much is absorbed depends on the size of estimated speed of the adjustment coefficient \(\lambda\).

An ECM equation can, therefore be broken into two component parts. For the consumption function it will look something like this:

\[\Delta c_t = -\lambda (\underbrace{ log(C_{t-1})-log(Wages_{t-1}-Taxes_{t-1}+Transfers_{t-1}) -log(\alpha))} _\text{Long run} +\beta \underbrace{\Delta x_t}_\text{short run}\]

The example below illustrates how different speeds of adjustment (\(\lambda\) = 0.3, 0.5 and 0,9) affect the error correction process.

With a slow speed of adjustment, the equilibrium level of 50 is not achieved until around 2030 (<51) or 2032 (50.5). With \(\lambda\)=0.5 the gap is closed in around 5 years (2025=50.5), while with \(\lambda\)=0.9 it takes just two years (2023=50.3).

Note

Advanced formatting The above example introduces some advanced formatting routines, using the Pandas style property. For more see here.

Info on python named colors are here.

import pandas as pd
ECMdf = pd.DataFrame({'E': 100},index=[v for v in range(2020,2051)])
ECMdf=ECMdf.upd('lbda3 lbda9 lbda5  = 100')
ECMdf=ECMdf.mfcalc('''
<2021 2050> dlog(Lbda3) = -.3 * (log(Lbda3(-1))-log(50))
<2021 2050> dlog(Lbda9) = -.9 * (log(Lbda9(-1))-log(50))
<2021 2050> dlog(Lbda5) = -.5 * (log(Lbda5(-1))-log(50))
''')
              
ECMdf.plot(title="Error correction process for different speeds of adjustment")
<Axes: title={'center': 'Error correction process for different speeds of adjustment'}>
../../_images/de9542bd9186bbebcbb7cc23f793ddff6a5bffd25472fe6e43a8c5ca4667ecb1.png
    
def color_proximity(val):
    if val > 80:
        color="red"
    elif val > 70:
        color="orangered"
    elif val > 55:
        color="coral"
    elif val > 51:
        color="lightsalmon"
    elif val > 50.5:
        color="peachpuff"
    else: 
        color="white"
    return 'background-color: %s' % color



ECMdf.loc[2020:2035,['LBDA3','LBDA5','LBDA9']].style.applymap(color_proximity).format(precision=2).set_table_attributes('style="font-size: 10px"')
ecm lambda